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The sequence of the respective DNA regions down- 
stream of the 205 proteasome structural genes prcB t A-, 
(6 kb) and prcB?A ? (33 kb) of Rhodococcus erythropolis 
NI86/21 were determined. A highly conserved gene 
organization was observed between the two clusters 
which differed significantly in G + C content (68.8% 
versus 62.6%). Several ORFs were homologues of 
putative genes previously identified by genomic 
sequencing of the equivalent DNA in the related 
ooeardioform actinomycete, Mycobacterium leprae, and 
thought to be specific for this pathogen. Three ORFs 
(ORF8,, ORF82, ORF120 without a counterpart in M. 
leprae were found. No significant homology to known 
sequences including proteasome-related gene prod- 
ucts was detected, except for ORF9j and ORF9 2 which 
display a high level of sequence identity with a par- 
tially sequenced ORF in Strcptomyces ckrysomaUus. 
.These downstream ORFs also show a significant level 
of sequence homology with the ORF6t and ORF6 2 
which are located upstream of the proteasome struc- 
tural genes in the respective clusters. 

Ktyioords: Actinomycetes, gene organization, Mycobacterium, 
proteasome, Rtiodococa*, Slreptomyccs 



The eukaryotic 26S proteasome is the central 
multisubunit protease of the ubiquitin pathway 
of protein degradation and is formed by a 20S 
core complex and two polar 19S complexes 
(Hochstrasser, 1995; Goldberg et ah, 1995). The 
20S proteasome constitutes the proteolytic core of 
this protease (Seemiillcr et al, 1995) and is com- 
posed of fourteen related but different subunits, 
which are either of the a-fype or of the p-rype. 
The barrel-shaped 20S particle consists of four 
seven-membered rings with a-type subunits in 
the outer rings and p-rype subunits in the inner 
rings (Lupas et al, 1993). The 20S proteasome 
which was discovered in the archaebacterium 
Thermoplasma acidophilum is formed of only two 
subunits, a and [S (Dahlmann et al, 1989). The 
same architecture was reported for a second 
archaebacterial 20S proteasome, recently isolated 
from Mi'tfuniosarcina thermophila (Maupin-Furlow 
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and Ferry, 1995). Remarkably, the first eubac- 
terial 20S proteasome, discovered in Rhodococcus 
erylhropolis N186/21, contains two a and two 
p summits (Tamura et al, 1995). The c^/p, and 
a 2 /p 2 subuntts are encoded by two unlinked 
gerie pairs, prB a A 7 and prcBrA 2 , respectively. The 
subscript numbers refer to corresponding DNA 
regions containing the genes. Until now, no 
bacterial counterpart of the euJkaryotic regu- 
latory 19S particle or enzymes of the ubiquitin- 
conjugating machinery have been identified in 
bacteria, although there is evidence for the pres- 
ence of ubiquitin in T. acidopMum {Wolf et al., 
1993) and Anabaena variabilis (Dumer and Bdger, 
1995). As part of our search for such proteasome- 
reiated genes, we sequenced the DNA regions 
downstream of both proteasome gene pairs. 

The 357 bp Smal-BamHI fragment of prcAy 
(Tamura ei al, 1995) was used as a probe in 
plaque hybridization to screen a XEMBL3 library 
of Sfli/3A-digcsted genomic DNA of R. erythropo- 
U$ NI86/21. The insert of AFAJ2030 was found to 
contain about 6 kb of downstream sequence, the 
remainder of the insert overlapping with the pre- 
viously characterized DNA fragment in 
A.FAJ2028 (Nagy et al, 1995; Tamura et at., 1995). 
In addition, 3.3 Id? of the region downstream of 
prcA 2 , contained in APAI2029 (Tamura et al, 
1995), was sequenced. DNA sequencing of both 
strands on overlapping fragments subcloned in 
pUC19 was carried out with an automated 
sequencer (A.L.F., Pharmacia Biotech). The PC 
GENE software (Intel) iGenerics) was used for 
sequence analyses. Potential coding regions were 
identified with the programs GCWIND (Shields 
et al, 1992) and FRAME (Bibb et al, 1984) 
Homology searches were performed using the 
FASTA, BLAST, and BLOCKS e-mail servers. 

The gene organization in both clusters of R. 
crythropolis NI86/21 is shown in Figure 1. All 
ORFs, except ORF8i and ORF82, were located on 
the same strand as the structural proteasomal 
genes, ORFIOj, ORFlli, and ORF12 A may be 
tfansLitionally coupled since the stop and start 
codons of the adjacent ORFs overiap by two base- 



pairs. Apparently, the highly similar gene organi- 
zation previously observed for the two protea- 
some gene clusters (Tamura et al., 1995) extends 
into the downstream region. We previously 
pointed out that the two clusters differ signifi- 
cantly in GC content. The sequence data for the 
downstream regions confirm this observation. 
The DNA region with prcB 2 A 2 (62.6%) has a sig- 
nificantly lower proportion of G and C than the 
prcB } A 2 region (68.8%). The data from genomic 
* sequencing of Mycobacterium leprae cosmid B2126 
reveal a quite similar arrangement of genes and 
ORFs. Since the M. leprae genome has not yet been 
completely sequenced, and proteasomes have not 
yet been isolated from this pathogen, it is not 
known whether a second set of proteasome genes 
exists in M. leprae as well. Two major differences 
with Rhodococcus are apparent. No equivalent of 
ORF81 and ORF82 is found adjacent to C3_260 
which represents the putative prcA gene of M. lep- 
rae. Instead, a 2.4-kb DNA region without obvious 
ORFs is present in Mycobacterium. Also, an equiv- 
alent of the rhodococcal ORF12, is missing. The 
GC content of the DNA region in Mycobacterium 
(59.2%) is closer to the value for the prcB 2 A 2 - 
containing fragment 

The extent of sequence conservation between 
individual ORFs is shown in Table L The high level 
of sequence identity between Rhodococcus and 
Mycobacterium reflects the close phylogenetic relat- 
edness of these bacteria, both belonging to the 
nocardioform actinomycete cluster of the high-GC 
gram-positive bacteria, it is likely that the prelimi- 
nary sequence data for M. leprae contain a sequenc- 
ing error between C2_219 and C3JZ65, and 
between C2_220 and Cl_181 since the introduction 
of a frameshift in these parts extends the homology 
over the entire length with ORF9 x /ORF9 2 and 
ORFlli respectively. Remarkably, the down- 
stream ORF9, and ORF9 2 also display significant 
homology (about 40% identity over the entire 
length) with the ORF6! and ORF6 2 (C1_172 in 
Mycobacteriam) which are located upstream of the 
prcB, and prcB 2 genes, respectively iprcB in 
Mycobacterium). Apart from ORF9, and ORF9 2 , no 
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FIGURE 1 Gene organization in the DNA regions of Rliodococcus erytiiropolL* NI86/21 containing the structural genes ft it the 20S 
proteasomo: cluster R t with /wrJM j and duster R 2 with prxB Ai- The sequences of the DNA fragments carrying prcMi (up to site; 
accession number U26421) andptB>A 5 (up to Pstf site; accession number U26422) we reported previously (Tainura et al, 1995). The 
numbers refer to the different ORFs represented by arrows, 1 he eqirrvalent region of the Mycobacterium leprae (MO cosmid B2126 with 
its preliminary annotation (accession number TJ00017) is shown for comparison. Dashed lines are used to delineate Works of gene 
organization conserved between Al leprae, and Rhtidncacats. The positions of putative framshifte; in the At leprae sequence are indicated 
with black dots. Filled arrows represent RJiodocvccus ORHs with no counterpart in the equivalent At leprae DNA region. One scale 
division represents 500 bp. T lhe sequences have been submitted to the EMBL database with accession number Z82O04 and 7.82005, 




apparent homologues of the different rhodococcal 
and mycobacterial ORPs are currently known The 
C-terminal parts of ORF9! and ORF9 2 display 
strong homology (64% identity in a 251 aa overlap) 
with the partially sequenced ORFA located 
upstream of the immunophilin gene flcbB of 
Strcptomyces riirysomallus (Fahl and Keller, 1994). 
These independently obtained sequence data also 
suggest that C2_219 and C3_265 most probably 



form one contiguous ORF, as found for Rhodococ- 
cus for ORF9i and ORF9 2 , Using the method of 
Dodd and Egan (1990) a potential helix-tam-helix 
motif ( 2 ^SAAEAAAELCVTITQIMSDLN 46 ) was 
predicted in the N-teiminal part of ORMli, but no 
significant homology with known DNA binding 
proteins was found. Since for most ORFs in this 
region of the M. leprae DNA identified by genomic 
sequencing true homologues have now been iderv 



TABU5 I Homology between putative gene products from Rhodoceccus erythrvpolis KI86/21 and 
Mycobadetvun leprae. Ousters R, and R? refer to the DNA regions downstream of prcA 2 and prcA^ 
respectively. N is used to indicate that no sequence data are available and A to denote the absence of a 
hornologue. 



Rhadoeoccus 



Mycobacterium 



Cluster Rj 


Ouster K ? 


% Identity 


Hornologue 


% Identity with hornologue in 
Cluster R, Cluster £ 7 


ORH8, 


ORF8n 


62.8% (487 aa) 


A 






ORF9, 


ORF9 2 


95.1% (447 aa) 


OL219* 


91%(302aa) 


9O%(302aa) 






C3_265 + 


84% (132aa) 


85%(132aa) 


ORH0, 


ORFIOI 


90.6% (85 aa) 


C3_266 


57% (330 aa) 


72.9% (85 aa) 


OKF11, 


N 




C2_220* 


62.6% (214 aa) 










CL181 + 


43.8% (73 aa) 




ORF12J 


N 




A 






ORF13, 


N 




OJ82 


50% (88 aa) 




ORF14f 


N 




C1.„183 


75% (-14 aa) 





homologous to N-terminal part of Rhadococcus OKF(s) 
+ Homologus to C- terminal part \ARhodooxcus ORr(s) 
^Truncated ORF (no further sequence data available) 
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tified in R. enfthropolis NI86/21, a non-pathogenic 
nocardioform actinomycete, these ORFs should no 
longer be considered sped fic for this mycobacterial 
pathogen. However, at present no predictions can 
be made about the possible functions of the charac- 
terizedORFs, 
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